Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is one of the most widely used dimensionality reduction techniques in machine learning and data science. It transforms the original features into a new set of uncorrelated features called principal components.
Core Concept
PCA finds directions (principal components) of maximum variance in the high-dimensional data and projects the data onto these directions. These principal components are:
- Orthogonal to each other (uncorrelated)
- Ordered by the amount of variance they capture
- Linear combinations of the original features
Mathematical Foundation
PCA is based on the eigenvectors and eigenvalues of the data's covariance matrix:
- Calculate the covariance matrix of the standardized data
- Calculate the eigenvectors and eigenvalues of the covariance matrix
- Sort the eigenvectors by their eigenvalues in descending order
- Choose the top k eigenvectors to form the projection matrix
- Transform the original data using the projection matrix
Algorithm Steps
-
Standardize the data (zero mean, unit variance)
- For each feature, subtract mean and divide by standard deviation
-
Compute the covariance matrix
- Calculate how each feature varies with respect to other features
-
Calculate eigenvectors and eigenvalues
- Eigenvectors determine the directions of the new feature space
- Eigenvalues determine the magnitude of variance in each direction
-
Sort eigenvectors by eigenvalues
- Higher eigenvalues indicate directions with greater variance
-
Select top k eigenvectors
- Choose number of dimensions to keep based on explained variance
-
Project data onto new subspace
- Transform original data into the reduced-dimensional space
Example
Consider a dataset with 3 features (height, weight, age) for various individuals. After applying PCA:
Original Features | Principal Components |
---|---|
Height | PC1 (0.7H + 0.65W + 0.3A) |
Weight | PC2 (0.2H - 0.3W + 0.93A) |
Age | PC3 (0.68H - 0.7W - 0.2A) |
If PC1 captures 75% of the variance and PC2 captures 20%, we might keep just these two components, reducing our data from 3D to 2D while retaining 95% of the information.
Selecting the Number of Components
Several approaches can be used to determine how many principal components to retain:
1. Explained Variance Ratio
- Plot the cumulative explained variance vs. number of components
- Choose k where the curve begins to flatten (elbow method)
- Alternatively, select k to reach a threshold (e.g., 95% variance explained)
2. Kaiser's Rule
- Keep only components with eigenvalues greater than 1
- These components explain more variance than a single original variable
3. Scree Plot
- Plot eigenvalues in descending order
- Look for an "elbow" in the plot where eigenvalues drop off
Advantages of PCA
- Reduces overfitting by decreasing model complexity
- Improves algorithm performance by removing multicollinearity
- Decreases computational requirements for subsequent processing
- Allows visualization of high-dimensional data in 2D or 3D
- Removes noise from data if lower dimensions capture the signal
Limitations of PCA
- Linear transformation only (can't capture non-linear relationships)
- Difficult to interpret the principal components
- Sensitive to feature scaling (requires standardization)
- May lose important information if choosing too few components
- Assumes orthogonality of components, which may not reflect reality
Applications
- Image compression and facial recognition
- Gene expression analysis in bioinformatics
- Noise reduction in signals
- Feature extraction for machine learning models
- Exploratory data analysis and visualization
- Anomaly detection in high-dimensional data
Code Example Pseudocode
# Standardize the data
X_std = (X - mean(X)) / std(X)
# Calculate covariance matrix
cov_matrix = (1 / n) * (X_std.T @ X_std)
# Calculate eigenvectors and eigenvalues
eigenvalues, eigenvectors = eig(cov_matrix)
# Sort eigenvectors by decreasing eigenvalues
sorted_indices = argsort(eigenvalues)[::-1]
sorted_eigenvectors = eigenvectors[:, sorted_indices]
# Select top k eigenvectors
W = sorted_eigenvectors[:, :k]
# Transform the data
X_reduced = X_std @ W